Identifying Verbal Collocations in Wikipedia Articles

نویسندگان

  • István Nagy T.
  • Veronika Vincze
چکیده

In this paper, we focus on various methods for detecting verbal collocations, i.e. verb-particle constructions and light verb constructions in Wikipedia articles. Our results suggest that for verb-particle constructions, POS-tagging and restriction on the particle seem to yield the best result whereas the combination of POS-tagging, syntactic information and restrictions on the nominal and verbal component have the most beneficial effect on identifying light verb constructions. The identification of multiword semantic units can be successfully exploited in several applications in the fields of machine translation or information extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of verbal and visuospatial working memory spans on collocation processing in learners of English

Much interest has recently been directed toward the knowledge of collocations in the field of second language learning since they have been asserted to improve fluency. The current study was intended to examine the effect of verbal and visuospatial working memory spans on the processing of collocations using a Self-Pace Reading Task (SPRT) and relevant working memory tasks. To this end, partici...

متن کامل

Constructing a Collocation Learning System from the Wikipedia Corpus

The importance of collocations for success in language learning is widely recognized. Concordancers, originally designed for linguists, are among the most popular tools for students to obtain, organize, and study collocations derived from corpora. This paper describes the design and development of a collocation learning system that is built from Wikipedia text and provides language learners wit...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Discourse Connective - A Marker for Identifying Featured Articles in Biological Wikipedia

Wikipedia is a free-content Internet encyclopedia that can be edited by anyone who accesses it. As a result, Wikipedia contains both featured and non-featured articles. Featured articles are high-quality articles and nonfeatured articles are poor quality articles. Since there is an exponential growth of Wikipedia articles, the need to identify the featured Wikipedia articles has become indispen...

متن کامل

The Workshops of the Tenth International AAAI Conference on Web and Social Media

We investigate the automatic generation of Wikipedia articles as an alternative to its manual creation. We propose a framework for creating a Wikipedia article for a named entity which not only looks similar to other Wikipedia articles in its category but also aggregates the diverse aspects related to that named entity from the Web. In particular, a semi-supervised method is used for determinin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011